Algorithms for Graph Similarity and Subgraph Matching
نویسندگان
چکیده
We deal with two independent but related problems, those of graph similarity and subgraph matching, which are both important practical problems useful in several fields of science, engineering and data analysis. For the problem of graph similarity, we develop and test a new framework for solving the problem using belief propagation and related ideas. For the subgraph matching problem, we develop a new algorithm based on existing techniques in the bioinformatics and data mining literature, which uncover periodic or infrequent matchings. We make substantial progress compared to the existing methods for both problems. 1 Problem Definitions and Statement of Contributions 1.1 Graph Similarity Problem 1 1 Given: two graphs G1(n1, e1) and G2(n2, e2), with possibly different number of nodes and edges, and the mapping between the graphs’ nodes. Find: (a) an algorithm to calculate the similarity of the two graphs, which returns (b) a measure of similarity (a real number between 0 and 1) that captures intuition well. Innovations: a) We develop a method involving belief propagation, unseen in literature, to solve this problem b) The method (and its fast linearized approximate version) gives extremely agreeable results c) Except for scalability, we know of no shortcomings of this method. 1.2 Subgraph Matching Problem 2 Given: a graph time series, where there are T number of graphs. Find: (a) An algorithm to find approximate subgraphs that occur in a subset of the T graphs. (b) Where the approximate subgraphs may not occur in the majority of the time points, but in local sections of the time series Innovations: a) We develop a principled approach to selecting the important time components from which subgraphs should be mined. Our method is also tailored for the problem of selecting subgraphs in biological networks. For this, we use sparse PCA which has not been for this application domain. b) Scalability: Our method is both fast and scalable to real biological data (1000s of nodes). However, it has not been demonstrated whether it can scale to extremely large networks of more than 10 000 nodes. c) The method gives results that are easy to interpret and biologically sensible. Disclaimer of interests intersecting with course project Aaditya may use the PhoneCall dataset for his DAP. Danai is interested in graph similarity and belief propagation for research. Ankur has used tensors for his research, but in a different context. Jing has used CODENSE before, and is interested in improving it for research purposes. None of the authors have other course projects this term. The following are the papers read for this course (refer to the numbering in the references section): Jing [21], [28], [22], Ankur [20], [18], [9], Aaditya [5], [26], [27], Danai [10], [14], [15].
منابع مشابه
Graph Similarity and Matching
Measures of graph similarity have a broad array of applications, including comparing chemical structures, navigating complex networks like the World Wide Web, and more recently, analyzing different kinds of biological data. This thesis surveys several different notions of similarity, then focuses on an interesting class of iterative algorithms that use the structural similarity of local neighbo...
متن کاملChallenging Complexity of Maximum Common Subgraph Detection Algorithms: A Performance Analysis of Three Algorithms on a Wide Database of Graphs
Graphs are an extremely general and powerful data structure. In pattern recognition and computer vision, graphs are used to represent patterns to be recognized or classified. Detection of maximum common subgraph (MCS) is useful for matching, comparing and evaluate the similarity of patterns. MCS is a well known NP-complete problem for which optimal and suboptimal algorithms are known from the l...
متن کاملNeighbor-Aware Search for Approximate Labeled Graph Matching using the Chi-Square Statistics
Labeled graphs provide a natural way of representing entities, relationships and structures within real datasets such as knowledge graphs and protein interactions. Applications such as question answering, semantic search, and motif discovery entail efficient approaches for subgraph matching involving both label and structural similarities. Given the NP-completeness of subgraph isomorphism and t...
متن کاملGraph matching: filtering databases of graphs using machine learning techniques
Graphs are a powerful concept useful for various tasks in science and engineering. In applications such as pattern recognition and information retrieval, object similarity is an important issue. If graphs are used for object representation, then the problem of determining the similarity of objects turns into the problem of graph matching. Some of the most common graph matching paradigms include...
متن کاملExtending graph homomorphism and simulation for real life graph matching
Among the vital problems in a variety of emerging applications is the graph matching problem, which is to determine whether two graphs are similar, and if so, find all the valid matches in one graph for the other, based on specified metrics. Traditional graph matching approaches are mostly based on graph homomorphism and isomorphism, falling short of capturing both structural and semantic simil...
متن کاملAn efficient least common subgraph algorithm for video indexing
Many tasks in computer vision can be expressed as graph problems. This allows the task to be solved using a well studied algorithm, however, many of these algorithms are of exponential complexity.This is a disadvantage when considered in the context of searching a database of images or videos for similarity. Recent work by Messmer and Bunke has suggested a new class of graph matching algorithms...
متن کامل